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Abstract — Recently, formal methods like model checking or 
theorem proving have been considered efficient tools for 
software verification. However, when practically applied, those 
techniques suffer high complexity cost. Combining static 
analysis with dynamic checking to deal with this problem has 
been becoming an emerging trend, which results in the 
introduction of concolic testing technique and its variations. 
However, the analysis-based verification techniques always 
assume the availability of full source code of the verified 
program, which does not always hold in real life contexts. In 
this paper, we propose an approach to tackle this problem, 
where our contributed ideas are (i) combining function 
specification with control flow analysis to deal with source- 
missing function; (ii) generating self-complete programs from 
incomplete programs by means of concrete execution, thus 
making them fully verifiable by model checking; and (iii) 
developing a constraint-based test-case generation technique 
to significantly reduce the complexity. Our solution has been 
proved viable when successfully deployed for checking 
programming work of students. 

Index Terms — specification-based model checking, concolic 
testing, constraint-based test-case generation, incomplete 
programs 

I. Introduction 

Recently, formal methods, e.g. model checking [1] or 
theorem proving [2], have been increasingly applied for 
software verification. Whereas those techniques were proved 
efficient, at least theoretically, to identify and explain real 
bugs in a program, they suffer state explosion problem even 
with some simple typical non-trivial cases of verification. To 
tackle this problem, one of remarkable approaches is to adopt 
test-case generation techniques used in software testing to 
produce only sufficient input when performing model-based 
verification of programs. Especially, the concolic testing 
technique, which is a hybrid software verification technique 
that interleaves concrete execution (testing on particular 
inputs) with symbolic execution [7] has been emerging as a 
efficient test-case generation technique. Recently, DART [3], 
SYNERGY [4], DUALIZER [5] and DASH [6] are notable 
approaches based on this idea. 

However, those techniques require that the whole source 
code of the verified program must be available for analysis. 
This assumption does not always hold in practical situations 
of software development context; particularly when the 
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program invokes some library functions provided as binary 
form. 

In this paper, this situation is regarded as the verification of 
incomplete program 1 , which we propose a framework to tackle. 
By doing so, we have made the following key research 
contributions: 

• Handling the problem of analyzing source-missing 
functions by combining function specification with program 
control-flow to produce combined constraints sufficiently 
covering all of possible scenarios. 

• Using concrete execution to replace functions invocation 
by the generated output value. Thus, the incomplete 
programs will be transformed into self-complete program fully 
available for model checking. 

• An algorithm known as CTG E (Efficient Constraint-based 
Test-case Generation) to generate combined constraints in 
linear time, instead of exponential time suffered by the brute- 
force approach. 

The rest of the paper is organized as follows. Section II 
gives some relevant background. Section III discusses some 
motivating examples. In Section IV, we present our general 
verification framework. The CTG E algorithm is presented in 
Section V. The next sections give some experimental results 
and conclude the paper. 

II. Background: Model Checking and The Concolic 
Testing Techniques 

A. Model Checking 

Model checking, first termed by Clarke and Emerson [1], 
is an automatic verification technique for finite state 
concurrent systems. In model checking, the system/program 
to be verified is first formalized as a mathematical model. In 
model checking, the model is often in the form of Kripke 
structure [9]. Basically, a Kripke structure can be considered 
as a nondeterministic finite automaton in which the temporal 
logic [10] can be applied to verify a certain characteristic of 
an input string. 

Compared to other verification techniques, model checking 
offers a practically useful advantage of producing counter- 
example when detecting system error. Such, the error can be 

The term incomplete here implies the lack of some parts in the 
source code, not the completeness of program in terms of 
functionality. 
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inspected in an obvious and convincing manner. There are 
many attempts made to improve the capability of generating 
counter-examples for model checking, ranging from symbolic 
approach [11] to probabilistic approach [12] [13]. They can 
be introduced as a general framework [14] or a work aiming to 
a specific model checking software [15]. 

B. The Concolic Testing Techniques 

LISTING I. An Illustrated Example of Concolic Testing 

void too (int x. tut v) 

{ 

0: if(x!=v) 
1: if(2** = x + 10} 
2 : errorO : 
J 

The concolic testing, which is a hybrid software verification 
technique combines concrete execution with symbolic 
execution [7], has emerged recently as an efficient technique 
for test-case generation. As compared to traditional white- 
box testing, the concolic technique attracts much attention 
due to its capability of reducing the number of path conditions 
to be explored. 

For example, with the program given in Listing I, there are 
two branch conditions of (xl=y) and (2*x = x+10). For 
traditional white -box testing, there would be three path 
conditions needed to be considered. When concolic testing 
is applied, it first randomizes arbitrary values for x and y, e.g. 
x=l and y=2. In the concrete execution, the test in line is 
reached since the condition of x\=y is true but the test in line 
1 failed because the condition 2*x = x+10 is false. 
Concurrently, the symbolic execution follows the same path 
but treating x and y as symbolic variables. The condition of 
(x\=y) a (2 *x != x+ 10) now is called a path conditions. To let 
the verification follow a different execution path on the next 
run, this approach takes the last path condition encountered, 
i.e. 2*x != x+ 10, and then negates it, producing 2 *x = x+ 10. 
An automated theorem prover is then invoked to find values 
for the input variables x and y satisfying the new produced 
condition x\=y a 2*x=x+10. Let them be x=10, y=5, for 
instance. Running the program on this input set reaches the 
error. Thus, we only need to explore 2 path conditions if 
using concolic testing. 

III. Motivating Example: Model Checking on Incomplete 
Program 

A. Incomplete program 

To give a clear illustration on our motivation of this 
research, let us consider the following verification context of 
modular program, as presented in Listing II. In the original 
program, given in Listing 11(a), the main function will 
subsequently call the two functions/wnc7 and func2. Among 
them, fund is a simple library function and func2 is doing a 
critical task which is needed to be verified carefully. It is 
assumed that if the value T-error is passed to func2, the 
simulated model checking can detect a real bug of a program 
accordingly. 
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LISTING II. Transforming A Modular Program for Model Checking 



int funcl (int n) 
{ 

if (n>10) return 2 *n; 

sis e if (n<5) return T- erro r: 

else return n: 

} 

void func2 (intb) 
{ 

doinE s Dm: critical tasks T 
error ■will b e detect if the input is T - arror 

T: 

} 

v oid main 

{ 

int a=l.b=l; 
read_from_input(&a) : 
if (s>fl)b = funcl(a); 
func2{b); 
return OK: 

} 

(a) Moduloy program 

v oid main 
{ 

inta=Lb-l; 
read_from_input(&a) : 
if (a>0) { 

if(fl>10]b=2*a: 

els a if (a<5) return T- erro r: 

als a b = a: 

} 

T'(b); 
return OK: 

} 

(h) Trancjarmed pyogyam 

requires n?0 

g ensures (n>10=>''ire5iil1==2*n) \\{n<5 =>''resull==T_error) 
((n<=l Sc. Sc. n>=;) => '.resuH==n) 

int fund (int n) :, 

v oid main -Q 

{ 

int a=l.b=l:. 
read_from_input(£a) : 
if (sM)) b = funcl(a}; 
func2(b):. 
return OK: 

} 

ft) InsomplBSB program 

Typically, in order to check this program using model 
checking technique, one needs to transform the program into 
a non-modular form similar to that which is given in Listing 
11(b). When using concolic approach on the transformed pro- 
gram, it is easily observable that there are 4 path constraints 
generated of (i) a>10 (ii) a<=10 a a>5; (iii) a<5 a a>0; and (iv) 
a<0. Solving constraint (iii) will give us necessary test-case 
leading to the error, e.g. a=3; In Listing 11(b), T implies the 
transformed code of T. 

However, in order to perform the transformation as 
discussed, one would need the full source code of fund . It is 
not always possible in practical situations, where fund may 
be a function commonly used from an existing library. In other 
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words, one may need to verify a program without its full 
source code. In this paper, this situation is regarded as the 
verification of an incomplete program. 

As illustrated in Listing 11(c), we do not possess the source 
code fund , supposedly called from a dynamic library. This 
will pose two major problems when one wants to verify this 
program: (i) lack of the transformed code to be further 
processed by a model checking tool; and (ii) lack of enough 
test-case to be verified sufficiently when providing input for 
the corresponding model. 

For clearer observation of the problem (ii), let us consider 
using concolic testing for the program in Listing 11(c). There 
will be two test-cases to be generated, corresponding to the 
two constraints of a>0 and a<0. Obviously, there is a risk that 
if the two concrete test-cases are a=17 and a=-2, the potential 
error when fund probably returns T-error will not be detected. 

B. Generation of combined constraint and simulated results 

Here, our approach to overcome this problem is that we 
may assume all library functions are well-defined, i.e. their 
semantics can be annotated using pre-conditions and post- 
conditions as depicted in Listing 11(c). By combining of the 
pre/post-condition and with the path constraints of the pro- 
gram, one can obtain sufficient combined constraints for 
generated test-case. For example, in Listing 11(c), when ana- 
lyzing the path constraints, one will obtain (Xj): a>0 and (x,): 
a<=0. By analyzing the pre/post-conditions, the following 
addition constraints are added: (p ( ): a>10; (p 7 ): a<=10 a a>5; 
and (p 3 ): a<=5 a 

LISTING III. Generated Self-complete Programs for Model Checking 



1 oid main 

{ 

bit 3=: .h='. \ 

a = 12: /.'testcase 1 

if(a?0)b = 24:, 

T(b>; 

return OK: 

} 

fa) Simulated pyagyam 
teztease a = 12 

v oid main 

{ 

inta=l.b=l; 
a - 3; .'.-'tasteasa 3 
if (asO) b = T- Erro r: 
T'(b); 
ratum OK: 

} 

ft) Simulated pyagyam u : dl?i 
tezlcaze a = J> 



v oid main 

{ 

Hit a=l,b=l; 
a = 7:.//te5tcase2 
if(a>Q)b = 7; 
T'(b); 
return OK: 

} 

fb) Simulated pyagyam nilk 
testtazs a = 7 

v oid main 

{ 

inta=l.b=l; 

a = 7:.//te5tca5s4 

it fai-O) b — unknown: 

T'(b); 

return OK: 



} 



fd) Simulated pyagyam u'dtft 
tezlcaze a = ~ 



a>0K Then, we generate the following valid combined 
constraints of (z 1/ p 1 ): a>10; (x lA p 2 ): a<=10 a a>5; (t 1a p 3 ): 
a<=5 a a>0 and (x ): a<=0. Using a solver to solve those 
constraints, sufficient test-case will be obtained; 
e.g. a=12, a=l, a=3 and a= -4 3 . 

In the next-step, to tackle the source-missing problem of 
fund, this function is invoked with the specific test-case 
generated above. As a result, we will acquire the actual 
outputs, which are respectively 24, 7 and T-error. The test- 



case of a=-4 will not be used since a symbolic execution may 
point out that this test-case cannot pass through the 
checking condition to reach the function call. 

Next, instead of transforming the source code, the result 
of fund is simulated using the generated output just obtained 
about. As a result, there are four newly generated programs 
whose source codes are self-complete as presented in Listing 
III. Hence, we can identify the bug when processing the one 
generated in Listing III(c). 

IV. Specification-based Model Checking Framework 

The verification framework is proposed in Fig. 1. As 
discussed in the motivating example, the verification on 
incomplete programs consists of the following major steps: 

• Specification-based Test-case Generation: It 
generates test-case based on the combination of constraints 
inferred from function specification with those from control 
flow analysis. 

• Code Transformation: Once the test-cases are 
generated, we will replace the call of source-missing library 
functions by appropriate concrete values produced as if the 
functions are actually called. 

• Model-oriented Translation: It will translate the 
self-complete programs into model descriptions, on which a 
model checker will perform a proper verification to find the 

real bugs if occurring. 




Figure 1 . The specification-based model checking framework 

V. Efficient Constrain-based Test-case Generation 

When we combine the constraints generated from the 
function specification and the program flow, it will reach the 
exponential complexity since all of possible combinations of 
constraints must be explored. 

To reduce the complexity, we then introduce an algorithm 
named CTG E (Efficient Constraint-based Test-cases 
Generation), which is shown in Fig. 2. CTC^ aims at producing 
test-case from combination of two sets of constrains. The 
main idea of CTG £ is that it does not try to make all possible 
combined constraints. Instead, CTG £ processes each 
constraint of a certain set. For each path condition, CTG £ 
first produces an appropriate test-case. Then, it calls a sub- 
procedure named combine to further process. 

1 The parameters of n in the pre/post condition will be replaced by a 
in the constraints performing the inter -procedure analysis 

2 There would be 2 x 4 = 8 possible combination of constraints, among 
which only 4 are satisfiable. In Section V we will discuss how to reduce 
the complexity of constraint combination 
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For every test-case t processed in combine, a specific 
function named symbolic jexec will be called to find the 
constraints which t belongs to. The operation of 
symbolic _exec will perform symbolic execution, a classical 
technique to trace the execution path of given input by 
tracking symbolic rather than actual values. Based on that, 
combine keeps generating relevant constraints and calls itself 
recursively to generate more suitable test-cases. During the 
whole process of CTG £ , we also make use of a special 
constraint named C- which marks the explored parts in the 
space of test-case domain. Therefore, CTG £ can avoid 
duplication when generating constraints and test-cases. 

For example, let us consider the two following sets of 
test-case P = {P=(n>0); f=-,(n>0)} and Q = {Q=n>3; 
<2 =— i(h>3) } . First, CTG £ generates randomly a test-case for a 
constraint. Let it be «=4 for P . Performing symbolic execution 
on the test-case, one can realize that the test-case falls into 
the combined constraint 

ALjorUhtni CTCr" (Efficient Constraint-based Test-cases 
Generation) 

Input: 1'V.J'e: two set ot path constraints 
Onlpur: 2": set * f test-cas es 
Operations 

r=0 

Foreach {path constraint '/ E R£) 

r = sohe_caHzira int (jri— iG™tj) 

comb oie(t) 
End For 

SnbProcEdnrE combine (test-case r) 
Beein 

add t to T 

Ot = symbo li£_sx8c{L V F ) 
[J = symbolic _exec{LVr) 
C mat = £wu (a n p) 
if (ary^pVv-iCL,*) then 

c omb ine{zoKe_con£ira mr{«n— >pY>^G„ri)) 
End if 

if (-^anPnC™^) *0 then 

comb iyis^ob^^oasSyois^—Str^T^—C^-rff^ 
end if 

End 

Figure 2. Efficient Constraint-based Test-case Generation (CTG E ) 
algorithm 

PjAQ=n>0 && n>3=n>3. Then, CTG £ tries to solve the 
formula PjA— igjA— iC with C being updated as 
C^PjAgj. We have P^Q^C™=ti>Q && -,(n>3) && 
— i(«>3)=n>0 && ra<3. Then, a test-case is generated 
accordingly, e.g. « = 2. 

Next, combine(2) is invoked, which is corresponding to 
the constraint PjAg, with C being updated as n>3u«>0 
&& n<3=n>0. We then have'"p 1 A-,g,A-,C mrk =n>0 && n>3 
&& !(n>O)=0, then this formula is not considered. 

Meanwhile, we have — iP,aQ,a— iC , = 

1 ^2 mark 

!(«>0)&&!(«>3)&&!(n>0)=nd"0. Solving this constraint, one, 
for instance, gets a new test-case of n=-7. Then, combinef- 
7) is invoked accordingly. At the moment, C is updated as 
n>0u!(n>0) && !(n>3)=n>0un<0, making P,A^g 7 A-,C Mrf = 
— iP-aQ.a— iC , = — iP.aQ.a— iC , = 0. Thus, the algorithm 
stops with no more test-cases generated. 
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Theorem 1. The set of test-cases generated by CTG E algorithm 
is sufficient to cover all of possible valid combined 
constraints. 

Proof Assuming that an undirected graph G = <V,E> 
constructed as follows: each vertice v in V corresponds to a 
solvable combined constraint generated by the CTG E 
algorithm. An edge e..= (v.,v.) is added to E if v. and v. are sub- 
constraints of a constraint in either V p or V N . 

For example, in Fig. 3 is the graph constructed when we 
consider the set constraints P and Q previously discussed. 
In the graph, there are three vertices corresponding to three 
solvable constraints. There is an edge connecting Vj and v, 
since their constraints are both sub-conditions of P . Similarly, 
v, and v are connected since their constraints are both sub- 
conditions of Q n 

considered visited if CTG E produces a test-case satisfying 
the corresponding combined constraint of v. If all vertices in 
G are visited after CTG E finishes, then CTG E generates 
sufficient test-cases to cover all of possible valid combined 
constrained. 

When CTG E begins, it starts by a certain test-case / 
generated to satisfy a path constraint a of V . Using symbolic 
execution, one can determine the path condition P of V which 
/belongs to. It means that a vertice q = an/Jjust has been 
initially visited. 

Consider the formula an— \fi referring to a vertice q ', which 
should be connected to q since anp and an— .p are both 
sub-constraints of a. Let C , be the formula 



V1 




Figure 3. A graph representation of combined constraints 

representing all of vertices already visited (i.e. the combined 
constraints whose corresponding test-cases have been gen- 
erated already). Similarly reasoning, we finally obtain that 
the two formulas an— t/?n— iC and — .an/Jn— iC . should 
represent all vertices connecting to q which have not been 
visited. By recursively solving those formulas and updating 
C .in the sub-procedure combine, CTG E will iteratively visit 
all of vertices in the connected component which q belongs 
to. 

Lastly, one can note that by checking all of constraints of 
V p , CTG E will travel to all possible connected components of 
G. Thus, all vertices of G will be logically visited when CTG £ 
performed and there are no vertices doubly visited. □ 

Complexity Analysis . Performing elementary analysis on 
CTG E , one can realize that CTG £ will involve the embedded 
solver 2K times, with K is the number of test-cases generated 
and K<N+M where and M are the path constraints on V p 
and V E respectively. If we take into account the actions of 
generating path conditions on V , the total complexity of 
CTG E will be 0(2K+M) ~ 0(3N+M) which should be improved 
significantly compared to that of the original CrG. 

—ACEEE 



ACEEE Int. J. on Information Technology, Vol. 02, No. 02, April 2012 



VI. Experiments 

The approach in this paper was tested in a practical 
situation of evaluating programming exercises of university 
students. The data set is collected from the programming 
work submitted by students from Faculty of Computer Science 
and Engineering, Ho Chi Minh City University of Technology. 
The requirements to be fulfilled in this experiment are non- 
trivial programming problems given to students. The list of 
problems is given in Table 1, which also gives the information 
of the combined constraints make from path conditions. For 
loop-based programs, the path constraints are computed 
using the coverage analysis technique [8], in which the loops 
are enforced to repeat respectively 0, 1, 2 and more than 2 
times. Thus, the algorithm may have some limitations on 
programs with complicated loops. 

The dataset used in this experiment is collected from the 
work of 50 students. In fact, there are actual marked 
programming works. Basically, for each programming problem, 
we annotate the student works to get their marked 
automatically using some verification tools. However, as 
students recently have been allowed to use library functions, 
e.g. mathematical functions defined in <math.h>, our existing 
approach purely relying in model checking is hindered 
significantly. With the proposed framework in this paper, we 
can now evaluate student work in an automatic manner. 
We also do compare the performance of our approach with 
the typical white -box approach. When manually inspecting, 
it was observed that there are only 89% students' bugs 
detected using white-box approach. Exact information on 
improvement of bug detection is given in Table II. When the 
constraint-based approach is applied with teachers' sample 
solutions playing the roles of original versions and student 
works evolved versions, the performance of bug detection is 

TABLE I. 

Programming Problems Used as Experimental Data 



Nd 


Problem 


Constraint 


5 olvar 
tails 
(brut?- 

force ) 


Solvar calls 
(CTG*) 


1 


Laap vht 
checkine 


14 


42 


40 


: 


Trianela 
classification 


:: 


S9 


r. 




Dat=- validation 
;]: = ;kb: z 


s: 


716 




4 


Tim? 

validation 

chsckuiE 


:s 


96 


r 


: 


Factorial 
CDmputinE 


:s 


96 


58 


6 




:s 


96 


:S 


' 


Prim? numb ai 


56 


3S4 


91 


s 


Sum of 1 ..n 


2f 


B4 


54 



significantly improve with 98% bugs detected. Few bugs are 
still missed because our solver fails to resolve some complex 
non-linear expression in path conditions. 
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TABLE II. 

Bugs Detected by Brute-force and Combined Constraints Approach 



Problem Nd 


Rz-al Buss 


Datactad by 


D=-t=-ct=-d by 






vrbitE-b ox 


LI Lr- ■ 


■_ 


;2 


■_ ; 


12 


: 




6 




< 


:2 






4 


:. ; 


::j 


:< 


: 


:4 


.4 


1: 


6 


:: 


:: 


n 


— 


:i 




'.2 


s 


12 


12 


11 


Toial 


96 


S6(B3 3 <d) 


94(9 8%) 



VII. Conclusions 

In this paper, we present an approach to verify incomplete 
programs, which reflect a practical situation that the source 
code of whole software project may not be always available. 
The approach is based on the concolic technique. In particular, 
to tackle the problem of analyzing source -missing functions, 
we propose to combine the functions specification with the 
program flow. Thus, we can still generate sufficient test-case 
covering all of real execution scenario. 

Our approach has been applied in a practical application 
of checking programming works of students, where the code 
submitted by students often involving source-missing library 
functions. Experimental results showed that this approach 
has gained some initial promising results. 
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